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Abstract 


The  probabilistic  performance  of  a  number  of  algorithms  for  the  NP-complete 
Satisfiability  Problem  (SAT)  has  been  investigated  analytically  and  experimentally 
using  a  fixed-clause-length  model  generating  n  clauses  of  k  >  3  literals  taken  from 
r  variables  as  well  as  a  random-clause-length  model  generating  n  clauses  containing 
each  of  r  variables  independently  with  probability  p.  In  the  case  of  the  random- 
clause-length  model  one  polynomial  time  algorithm  has  been  shown  to  find  a  so¬ 
lution  to  a  random  instance  /  of  SAT  with  probability  approaching  1  as  n  and  r 
get  large  when  a  solution  exists  for  I.  In  the  case  of  the  fixed-clause-length  model, 
we  have  discovered  an  algorithm  which  almost  always  finds  a  solution  to  random, 
satisfiable  instances  of  SAT  with  k  =  3.  We  have  also  shown  that  none  of  a  wide 
class  of  algorithms  can  verify  unsatisfiability  in  polynomial  time  almost  always. 

We  have  also  studied  the  algorithm  of  Angluin  and  Valiant  for  the  NP-complete 
Hamiltonian  Circuit  problem.  It  was  shown  by  them  that  a  hamiltonian  circuit  can 
be  found  in  random  graphs  which  are  hamiltonian  with  probability  tending  to  I 
as  graph  size  tends  to  infinity.  We  have  found  that  their  algorithm  almost  never 
finds  hamiltonian  circuits  in  k-regular  graphs  which  are  hamiltonian.  On  the  other 
hand,  we  have  discovered  an  algorithm  which  often  finds  hamiltonian  circuits  in 
such  graphs. 
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1.  Research  Objective 


The  goal  of  this  research  is  to  develop  and  analyze  algorithms  which  can,  in  some 
practical  sense,  solve  NP-complete  problems  quickly.  NP-complete  problems  appear 
in  many  disciplines  such  as  Cryptology,  Operations  Research,  Artificial  Intelligence 
and  Computer  System  Design.  NP-complete  problems  are  the  “hardest”  of  a  class  of 
problems  known  as  NP.  Associated  with  each  NP  problem  we  consider  is  an  infinite 
set  of  instances.  Instances  may  take  the  form  of  graphs,  logic  expressions,  sets  or 
many  other  structures  depending  on  the  problem.  Each  instance  has  a  size  denoted 
by  n.  Although  the  size  of  an  instance  /  may  be  formally  defined  as  the  number 
of  bits  needed  to  efficiently  encode  /,  for  our  purposes,  we  may  regard  the  size  of 
/  to  be  the  number  of  distinct  objects  in  I.  So,  for  example,  a  graph  containing  E 
edges  and  Q  vertices  has  size  n  =  E  +  Q.  Associated  with  each  instance  /  is  a  set 
of  variables,  a  set  of  values  that  can  be  assigned  to  each  variable  and  a  constraint 
function  Ui  that  maps  value  assignments  to  variables  to  {true,  false}.  For  example, 
if  /  is  a  graph  with  Q  vertices  we  might  associate  Q  —  1  variables  which  take  edge 
labels  as  values  and  a  constraint  function  which  has  value  true  if  and  only  if  the 
edge  set  corresponding  to  the  assignment  given  to  the  variables  is  a  spanning  tree 
of  /.  An  assignment  t  such  that  Ui[t)  —  true  is  a  solution  to  /.  An  algorithm  solves 
/  if  it  determines  whether  or  not  a  solution  exists  for  /. 

A  problem  in  NP  is  said  to  be  solved  efficiently  if  there  is  an  algorithm  which 
solves  every  instance  of  the  problem  in  time  bounded  by  a  polynomial  in  n.  Un¬ 
fortunately,  there  is  no  known  computational  scheme  for  efficiently  solving  any 
NP-complete  problem  and  it  is  considered  highly  unlikely  that  one  will  be  found 
(see  (2)  and  [12]) .  Thus,  every  known  method  for  solving  an  NP-complete  problem 
P  cannot  find  the  solution  to  some  instances  of  P  in  a  reasonable  amount  of  time. 
Furthermore,  there  is  little  hope  that  even  an  effective  randomized  algorithm  (see 
[13],  [22]  and  [23])  will  be  found  for  any  NP-complete  problem  since,  as  is  well 
known,  this  would  imply  an  unlikely  collapse  of  the  polynomial  hierarchy.  However, 
if  a  method  A/  can  be  found  to  efficiently  find  solutions  to  all  but  a  few  instances 
of  P  then  A/  might  be  a  practical  method  for  solving  P.  We  are  looking  for  such 
( M,P )  pairs. 


2.  Analytic  Tools 


We  use  probability  theory  to  measure  success  id  meeting  our  goal.  A  distribution 
D  is  assigned  to  the  set  of  all  possible  instances  of  P  of  size  n  and  we  prove  one  of 
three  kinds  of  results  for  a  given  algorithm  M: 

a.  M  finds  a  solution  to  an  instance  of  P  chosen  randomly  according  to  D  in  time 
bounded  by  a  polynomial  in  n  with  probability  greater  than  some  positive 
constant  k  as  n  gets  large.  Then  we  say  M  efficiently  solves  P  in  bounded 
probability  under  D. 

b.  M  finds  a  solution  to  an  instance  of  P  chosen  randomly  according  to  D  in  time 
bounded  by  a  polynomial  in  n  with  probability  approaching  1  as  n  gets  large 
(we  will  say  “with  probability  tending  to  I”).  Then  we  say  Af  efficiently  solves 
P  in  probability  under  D. 

c.  Af  solves  all  of  a  large  sample  of  instances  of  P  chosen  randomly  according  to 
D  in  average  time  that  is  bounded  by  a  polynomial  in  n  as  n  gets  large.  Then 
we  say  that  Af  solves  P  in  polynomial  average  time. 

Results  of  type  (a)  are  weaker  than  results  of  type  (b)  and  results  of  type  (b) 
are  weaker  than  results  of  type  (c).  It  is  often  the  case  that  we  can  prove  a  weaker 
result  but  not  a  stronger  one  for  a  particular  (A/,  P)  pair  under  D.  Although  a 
type  (c)  result  is  the  most  desirable  type  of  result,  even  a  type  (b)  result  will  allow 
us  to  conclude  that  Af,  in  some  practical  sense  (at  least  under  D),  efficiently  solves 
P.  A  result  of  type  (a)  cannot  always  allow  us  to  make  the  same  conclusion  since 
k  may  be  very  small  (say  .01).  However,  many  algorithms  we  consider  proceed  by 
assigning  values  to  variables  in  some  order  which  is  decided  during  computation 
and  assignments  are  never  undone  either  totally  or  partially.  These  algorithms 
either  continue  until  all  variables  are  assigned  values  (in  which  case  a  solution  has 
been  obtained)  or  they  stop  prematurely  because  they  discover  that  every  set  of 
assignments  of  values  to  unassigned  variables  cannot  possibly  lead  to  a  solution  (in 
which  case  it  cannot  be  determined  whether  or  not  a  solution  exists).  A  property  of 
these  algorithms  is  that  the  next  variable  to  be  assigned  a  value  is  chosen  randomly 
from  a  large  group  of  possibilities.  Thus,  repeated  runs  of  such  algorithms  will 
execute  differently  and  possibly  give  different  results.  If  the  probability  that  a 
run  finds  a  solution  is  bounded  from  below  by  a  constant  and  ail  runs  execute 
independently  then  only  a  constant  number  of  runs  would  be  nc  <  ssary  for  us  to 
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solve  a  random  instance  of  P  with  probability  arbitrarily  close  to  l  (this  can  be 
strengthened  to  a  type  (b)  result  if  the  number  of  runs  is  allowed  to  grow  slightly 
with  n).  Unfortunately,  it  is  not  the  case  that  all  runs  execute  independently. 
However,  for  the  algorithms  we  consider,  the  dependence  is  very  weak  and,  according 
to  the  results  of  our  experiments,  we  are  justified  in  supposing  that  a  small  number 
of  repeated  runs  of  M  will  allow  us  to  solve  P  with  probability  tending  to  1.  Thus, 
a' result  of  type  (a)  seems  to  translate  to  a  result  of  type  (b)  for  the  kinds  of 
algorithms  we  consider.  When  referring  to  results  of  either  type  (a),  (b)  or  (c)  we 
will  sometimes  use  the  phrase  “probabilistically  efficient 

Others  have  taken  this  approach  for  specific  NP-complete  problems.  Algo¬ 
rithms  which  are  probabilistically  efficient  have  been  found  for  the  Hamiltonian 
Circuit  problem  (1]  and  [19],  the  Planar  Traveling  Salesman  problem  [18],  the  Pro¬ 
cessor  Scheduling  problem  [7],  the  Bin  Packing  problem  [17]  and  other  NP-complete 
problems. 
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3.  Finding  Solutions  to  Instances  of  SAT:  Old  and  New  Results 


The  problem  we  are  primarily  interested  in  is  the  Satisfiability  problem  (SAT).  An 
instance  /  of  SAT  is  a  Boolean  expression  in  Conjunctive  Normal  Form  (CNF).  A 
CNF  expression  is  a  conjunction  (logical  and)  of  disjunctions  (logical  or)  of  literals 
(a  literal  is  a  Boolean  variable  or  its  complement).  A  disjunction  is  also  called 
a  clause.  A  solution  to  /,  if  one  exists,  is  a  truth  assignment  to  the  variables 
associated  with  literals  in  /  which  cause  /  to  have  value  true.  The  problem  is  to 
find  a  solution  to  7,  if  one  exists,  or  to  determine  that  no  solution  to  /  exists. 
An  instance  of  SAT  which  has  a  solution  is  said  to  be  satisfiable,  otherwise  the 
instance  is  said  to  be  unsatisfiable.  SAT  is  the  first  problem  found  to  be  NP- 
complete  and  is  closely  related  to  problems  in  Artificial  Intelligence  particularly  in 
the  areas  of  Theorem  Proving  and  Vision  Analysis.  Also,  any  problem  in  NP  can 
easily  be  transformed  to  SAT  and  transformations  from  SAT  to  other  NP-complete 
problems  are  often  straightforward.  SAT  is,  therefore,  one  of  the  more  important 
NP-complete  problems. 

Some  favorable  probabilistic  results  for  algorithms  which  solve  SAT  have  al¬ 
ready  been  obtained.  Let  V  =  {t>i ,  . . . ,  vr}  be  a  set  of  r  Boolean  variables.  A 

random  clause  contains  each  possible  literal  Vi,  . . . ,  vr,  v[,  . . .  with  probability  p 
independently  of  the  occurrence  of  any  other  literal.  A  fixed  length  random  clause 
of  length  k  contains  k  distinct  literals  which  are  equally  likely  to  be  any  fc-subset  of 
2 r  literals  associated  with  the  variables  of  V  such  that  no  two  literals  are  associated 
with  the  same  variable. 

The  random  clause  model  is  the  distribution  on  instances  of  SAT  where  each 
instance  has  n  independently  selected  random  clauses.  In  [3),  (14),  [15],  [20]  and 
[21]  the  average  running  time  of  several  algorithms  for  SAT  is  obtained  under  the 
random  clause  model.  The  conditions  under  which  at  least  one  algorithm  runs  in 
polynomial  time  on  the  average  are  as  follows: 


1)  limr_oo  rp  =  0,  n  >  rln(2)/  —  ln({r  4-  I)p). 

2)  limr_00  rp  =  oc,  limr_oo  p  =  0.  n  >  ln(2 )elrp/ep. 


3)  limp—oop  =  0,  np  <  \J )  c  constant. 

4)  limr_00  1/p  =  polynomial(r),  np  <  rep,  c  constant. 

5)  n<cln(r),  c  constant 

But,  in  [9]  wc  showed  that,  under  the  random  clause  model,  a  randomly  chosen 
truth  assignment  to  V  nearly  always  is  a  solution  to  an  instance  of  SAT  when 
a)  p  >  ln(n)/r,  and  an  instance  of  SAT  nearly  always  has  a  clause  containing  no 
literals  (such  an  instance  is  unsatisfiable)  when  6)  p  <  ln(n)/(2r).  We  also  showed 
that,  when  c)  ln(n)/(2r)  <  p  <  In (n)/r  and  nln(n)  <  ln(r),  where  a  is  any 
constant  greater  than  zero,  a  random  instance  of  SAT  has  no  variable  which  appears 
in  more  than  one  clause  with  probability  tending  to  1.  Since  a  clause  containing  nc 
variables  is  not  satisfiable  and  since  solutions  to  instances  containing  variables  that 
appear  in  at  most  one  clause  are  trivial  to  find,  we  may  conclude  that  instances  of 
SAT  generated  under  either  a),  6)  or  c)  may  be  trivially  solved. 

These  results  are  significant  because  conditions  a),  6)  and  c)  subsume  con¬ 
ditions  1)  —  5)  above.  Thus,  our  results  indicate  that  the  previous  results  on 
algorithms  for  SAT  are  favorable,  not  because  the  algorithms  analyzed  have  some 
special  property  which  make  them  fast  in  the  probabilistic  sense,  but  because  the 
assumed  distribution  generates  instances  which  have  the  property  that  almost  any 
simple-minded  algorithm  can  solve  them  efficiently  almost  all  the  time. 

Let  /  be  an  instance  of  SAT  and  let  c  denote  a  clause  of  I.  If  v  is  a  literal  Id  / 
then  we  use  comp(v)  to  denote  the  complement  of  t;.  In  order  to  express  algorithms 
for  SAT  succinctly  we  regard  clauses  of  I  to  be  subsets  of  literals  {vi, ...,  tip,  t>(,  ..,  vj.) 
and  /  to  be  a  collection  of  n  of  these  subsets.  In  [16]  we  considered  the  following 
algorithm  for  SAT:  • 


Ai[I)s 

While  I  ^  <t>  and  Vc  G  /,  c  yt  $ 

If  there  is  a  single-literal  clause  {u}  G  /  then  v  *—  u 
Else  choose  a  literal  v  randomly  from  L 
I  —  {c  —  { comp(v )}  :  c  G  /  and  v  £  e} 

L  *—  L  -  {t i,comp(v)} 

If  /  =  <f>  then  ret u rn ( “sat isfi able”) 

Else  return(“give  up”) 

A%  is  very  fast  as  it  never  assigns  more  than  one  value  to  each  variable  in  l 
Implicit  in  Ai  is  the  assignment  of  value  true  to  literal  v  and  the  assignment  of 
false  to  comp(v).  Therefore  A\  implicitly  finds  a  solution  if  it  does  not  “give  up” 

Recently  we  have  obtained  the  result  that  algorithm  Ax  efficiently  solves  SAT 
(does  not  “give  up”)  in  probability  under  the  random  clause  model  when  p  = 
cln(n)/r,  .5  <  c  <  1,  and  lim„if._<XJ  n1-c/r  —  0  [16).  In  [16]  it  is  also  shown  that 
instances  generated  according  to  the  random  clause  model  have  no  solution  with 
probability  tending  to  1  when  p  =  cln(n)/r,  .5  <  c  <  1,  and  limn.r—oo  nl~c/r  —  oc. 
Note  that  Aj  has  good  probabilistic  performance  even  when  variables  appear  in 
0(nln(n)/r)  clauses  on  the  average. 

These  results  are  significant  for  two  reasons.  First,  they  say  that  A\  almost 
always  finds  a  solution  to  a  random  instance  of  SAT  (generated  under  the  random 
clause  model)  when  one  exists.  Second,  they  demonstrate  the  power  of  the  following 
line  in  A\\ 

If  there  is  a  single- literal  clause  {«}  G  I  then  v  *—  u 

We  have  shown  that  A\  performs  very  poorly  (almost  alwnvs  gives  up)  wjthoijl 
this  line  when  ln(n)/(2r)  <  p  <  ln(n)/r 


Other  favorable  results  have  been  obtained  under  a  fixed  clause  iength  distri¬ 
bution.  The  fixed  clause  length  model  is  the  distribution  on  instances  of  SAT  where 
each  instance  has  n  independently  selected  fixed  length  random  clauses  of  k  literals. 
The  probability  that  a  random  instance  of  SAT  under  the  fixed  clause  length  model 
is  satisfiable  tends  to  0  as  n  and  r  tend  to  infinity  if  n/r  >  —  ln(2)/ln(l  —  2~fc). 
Furthermore,  the  average  number  of  solutions  to  a  random  instance  of  SAT  under 
this  model  is  exponential  in  r  if  n/r  <  -  ln(2)/ln(l  -  2“fc).  The  probability  that  a 
random  instance  of  SAT  under  the  fixed  clause  length  model  has  at  least  one  solu¬ 
tion  tends  to  1  if  n/r  <  ~f{k)f  ln(l  —  2~k )  where  f{k)  is  monotonically  increasing 
with  k  toward  an  asymptotic  value  no  greater  than  ln(2).  Because  the  character  of 
instances  changes  so  abruptly  here,  we  refer  to  the  point  n/r  =  -  f[k)j  Ini  i  --  2  fc) 
as  the  flip  point.  In  these  studies  k  is  assumed  to  be  independent  of  n  and  r.  Thus 
it  appears  that  the  case  where  lim„ir_0O  n/r  =  a,  where  a  is  any  constant  greater 
than  zero,  is  particularly  important  when  considering  the  fixed  length  clause  model. 

A  number  of  algorithms  have  been  analyzed  under  the  fixed  clause  iength 
model.  It  can  be  shown  that  a  randomly  guessed  truth  assignment  wili  almost 
never  be  a  solution  to  a  random  instance  of  SAT  under  the  fixed  length  clause 
model  if  lim„ir_00  n/r  =  a  where  a  >  0.  However,  according  to  results  in  [11]. 
an  algorithm  based  on  the  pure  literal  rule  (a  component  of  the  well  known  Davis- 
Putnam  procedure  (8])  efficiently  solves  SAT  in  probability  under  the  fixed  length 
clause  model  when  lim,,^,^  n/r  <  1.  Recently  we  have  shown  that  this  algorithm 
can  solve  SAT  efficiently  in  probability  under  the  fixed  clause  length  model  only  if 
the  limiting  ratio  n/r  obeys  n/r  <  1/(1  —  ke~kn/2r).  This  bound  is  close  to  1  for 
even  moderately  large  k\  for  example,  if  k  —  6  then  n/r  <  1.07.  These  results  am 
even  more  interesting  in  light  of  the  observation  that  stripping  each  clause  of  ail 
its  literals  except  for  two  results  in  an  instance  of  2-SAT  which  can  In'  solved  in 
polynomial  time  [12]  and  which  almost  always  has  a  solution  when  n/r  <  1  Sin  .' 
a  solution  to  such  an  instance  of  2-SAT  is  also  a  solution  to  the  stripped  instate  ■ 
of  SAT  from  which  it  was  created  and  since  almost  all  instances  of  2-S  A  I  n.r  • 
solutions  when  n/r  <  1,  the  trivial  method  of  stripping  literals  performs  about  <v- 
well  under  the  fixed  length  clause  modei  as  the  algorithm  based  or,  the  pun  liter  i, 
rule.  The  trivial  method  of  stripping  literals  and  the  algorithm  based  on  th«  i  nr. 
literal  rule  are  both  superior  to  an  algorithm  of  [4]  which  partitions  clause*.  ->f  . 
given  instance  /  into  groups  so  that  no  two  clauses  of  different  groups  share  !it<iro- 


associated  with  the  same  variable,  solves  SAT  for  each  group  and  combines  the 
solutions  to  each  group  to  get  the  solution  to  / 


But,  there  are  a  number  of  algorithms  that  have  been  shown  to  perform  much 
better  probabilistically  under  the  fixed  length  clause  model,  fn  (.'»]  we  showed  that 
Ai  efficiently  solves  SAT  in  bounded  probability  under  the  fixed  length  clause  model 
when 


2<c- 1 

lim  n/r  <  — - — 
n,r—  oo  k 


k- 3 


Notice  that  the  expression  on  the  right  side  of  the  inequality  is  —Of  1  /k)  j  inf  i  -  2~fc  1 
if  k  is  large. 


This  result  is  significant  for  two  reasons.  First,  we  cannot  make  the  claim 
that  Aj  almost  always  finds  a  solution  to  a  random  instance  of  SAT  when  one 
exists,  as  w-e  could  in  the  case  of  the  random  clause  model,  since  then  is  a  large 
gap  between  the  flip  point  ( n/r  —  -0(l)/ln(l  -  2~k))  and  the  point  where  Ai 
begins  to  work  well  probabilistically  (n/r  =  -0(l/k)/ ln(l  -  2-fc))  due  to  the  1  /k 
factor  which  appears  in  the  latter  term.  Furthermore,  for  that  range  of  n/r  over 
which  Ai  is  probabilistically  efficient,  it  is  only  able  to  find  solutions  efficiently  with 
bounded  probability  whereas  Ai  finds  solutions  efficiently  in  probability  (almost  all 
instances)  under  the  random  clause  length  model.  Thus,  we  see  that,  in  some  sense, 
the  fixed  clause  length  model  generates  harder  instances  than  the  random  clause 
length  model  (at  least  as  far  as  A\  is  concerned)  and  the  results  based  on  the  latter 
distribution  do  not  map  precisely  to  the  same  kind  of  results  based  on  the  former 


Wo  also  studied  the  following  generalization  of  A,: 

Mi)  ■ 

Repeat 

Let  c  be  a  smallest  clause  in  / 

Choose  u  randomly  from  c 

Remove  from  /  all  clauses  containing  u 

Remove  from  /  all  occurrences  of  comp(u) 

Until  /  is  empty  or  there  exist  two  complementary  unit  clauses  in  / 
If  /  is  empty  Then  return  ( “satisfiable’’ ) 

Otherwise  return  (“give  up”) 


t 


1 

I 


\ 

4 

1 

4 
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In  [5]  we  showed  that  A 3  efficiently  solves  SAT  in  bounded  probability  under 
the  fixed  length  clause  model  when 


..  ,  1.54  *2*~1  fk-l\k~2 

»,r— »oo  '  Jt+l  \k-2J 


for  4  <  k  <  40 


and  efficiently  solves  SAT  in  probability  under  the  fixed  length  clause  model  when 

k- 2 


,  0.92*2 M  /Jt-lV 

lim  n  r  <  - ; -  I  

n,r—  00  k  \k  —  2/ 


for  4  <  k  <  40. 


These  results  are  significant  for  three  reasons.  First,  Aa  efficiently  solves  SAT  in 
probability  (almost  always)  over  about  the  same  range  of  rz/r  that  ,4i  efficiently 
solves  SAT  in  bounded  probability.  Second,  the  range  of  n/r  over  which  ,4a  is 
probabilistically  efficient  is  only  slightly  greater  than  the  range  of  n/r  over  which 
Ai  is  probabilistically  efficient.  Thus,  although  A3  performs  much  better  than  A 1 
probabilistically,  there  is  still  a  wide  gap  between  the  flip  point  and  the  point  at 
which  A3  begins  to  perform  well.  Third,  and  most  important,  Aa  and  Ai  are  vastly 
superior  in  probabilistic  performance  compared  to  algorithms  that  rely  on  certain 
greedy  heuristics  to  select  the  nexT variable  to  assign  a  value  to.  An  example  of 
a  greedy  heuristic  is  “select  the  variable  v  for  which  the  difference  between  the 
number  of  occurrences  of  the  literal  v  and  the  literal  v '  in  /  is  greatest  and  assign 
variable  v  the  value  which  satisfies  most  clauses”.  However,  as  we  will  see  below, 
greedy  heuristics  added  to  Aj  and  A3  improve  the  performance  of  those  algorithms 
significantly,  especially  for  the  case  k  =  3. 

In  the  case  k  =  3  (CNF  expressions  with  three  literals  per  clause  are  instances  of 
the  3-Satisfiability  problem  which  is  also  NP-compIete)  we  have  shown  in  [6]  that  the 
maximum  occurring  literal  selection  heuristic  (if  there  are  no  single-literal  clauses 
in  /,  select  a  variable  randomly  and  assign  it  the  value  which  satifies  most  clauses) 
used  with  A\  efficiently  solves  SAT  in  bounded  probability  under  the  fixed  length 
clause  model  when  limnir— n/r  <  2.9.  In  the  case  k  =  3,  efficiently  solves  SAT 
in  bounded  probability  when  limnfr_oo  n/r  <  2.66  This  may  be  compared  with  the 
flip  point  (n/r  =  4). 


From  our  analysis  in  [5]  and  [6]  we  have  devised  the  following  new  algorithm 
for  SAT: 

MD  : 

Repeat 

If  there  is  a  single-literal  clause  {/}  in  /  Then  u  «—  / 

Otherwise  u  <—  /*  such  that  /*  G  £  and  for  all  /  6  L  «;(/*)  >  xv{l) 
Remove  from  /  all  clauses  containing  u 
Remove  from  /  all  occurrences  of  comp(u) 

L  —  L  —  {u,  comp(  u) } 

Until  I  is  empty  or  there  exist  two  complementary  Unit  Clauses  in  / 

If/  is  empty  Then  return  ( “sat isfi able’’) 

Otherwise  return  (“give  up”) 


where  tv(l),  the  weight  of  literal  /,  is  determined  as  follows: 

Let  c  be  a  clause  in  /  and  let  Pj(c)  be  a  weighting  function  mapping  clauses 
to  integers.  Let  us  say  that  ftj(c)  is  the  weight  of  clause  c  at  the  end  of  the 
jth  iteration  of  A$(I).  Initially  /i0(c)  =  1  for  every  clause  cG  /.  The  clause 
weighting  function  is  updated  as  follows:  if  /  is  the  literal  chosen  on  the  jtk 
iteration,  lVy(/)  is  the  total  weight  of  clauses  containing  /  at  the  start  of  the 
jth  iteration  (these  clauses  will  be  removed)  and  Nj{l)  is  the  number  of  clauses 
containing  comp(l)  at  the  start  of  the  jth  iteration  (one  literal  will  be  removed 
from  each  of  these  clauses)  then  Pj{c)  =  #j,_i(c)  +  Wy(/)/Ay(/)  if  c  contains 
comp(l),  fij(c)  =  0  if  c  contains  /  and  Pj(c)  =  /iy_j(c)  otherwise.  The  literal 
weighting  function  is 

w(i)  =  wAn/NAn 

According  to  our  experiments,  A3  solves  SAT  efficiently  in  bounded  probability 
under  the  fixed  length  clause  model  when  limn,r-.oo  n/r  <  4. 

The  significance  of  this  result  is  that  A3  appears  to  efficiently  solve  almost 
all  instances  of  3-SAT.  We  hope  to  prove  this  result  analytically  and  devise  an 
extension  to  A3  which  will  provide  similar  performance  for  any  fixed  value  of  k. 


4.  Verifying  Unsatisfiability 

Although  much  of  our  work  has  been  directed  toward  finding  algorithms  that  obtain 
solutions  when  they  exist,  we  are  also  interested  in  algorithms  that,  in  probability, 
efficiently  verify  the  unsatisfiability  of  instances  of  SAT  that  are  unsatisfiable.  The 
problem  of  verifying  unsatisfiability  seems  to  be  harder  than  the  problem  of  finding 
a  solution  when  one  exists  since  verification  seems  to  require  examination  of  many 
truth  assignments  to  make  sure  that  none  is  a  satisfying  assignment.  Algorithm  ,44 
below  represents  a  class  of  algorithms  for  verifying  unsatisfiability. 

MI)  : 

If  there  is  a  clause  in  /  which  has  value  false  Then  return  "unsatisfiable" 
If  all  clauses  in  I  are  true  Then  return  “satisfiable” 

Select  an  unassigned  variable  v  from  V 

Let  Ii  be  the  result  on  /  of  assigning  true  to  v 

Let  /j  be  the  result  on  /  of  assigning  false  to  v 

If  A4(/i)  and  A4(/2)  return  “unsatisfiable”  Then  return  “unsatisfiable” 
Else  return  “satisfiable” 


The  line  “Select  an  unassigned...”  allows  A4  to  be  any  one  of  a  wide  class  of 
algorithms  for  verifying  unsatisfiability  by  allowing  any  variable  selection  heuristic. 
Furthermore,  important  algorithms  for  verifying  unsatisfiability  such  as  the  Davis- 
Putnam  procedure  have  the  form  of  A4. 

Unfortunately,  we  have  found  that,  regardless  of  the  variable  selection  heuris¬ 
tic  used,  A4  requires  exponential  time,  almost  always,  to  verify  unsatisfiability  if 
instances  are  generated  according  to  the  fixed  clause  length  model  with  the  ratio 
of  n  to  r  fixed  (recall  that  this  is  the  most  important  relationship  between  n  and 
r).  Although  pessimistic,  this  result  is  important  because  it  shows  us  where  not  to 
look  for  algorithms  that  verify  unsatisfiability  efficiently  in  probability. 


5.  Hamiltonian  Circuits 


Given  a  graph  G  —  (V,  E)  with  |V|  =  n,  a  hamiltonian  circuit  in  G  is  an  ordering 

<  vi,V2,...,vn  >  of  vertices  in  V  such  that,  for  ail  1  <  t  <  j  <  n,  t>,-  ^  vy  and 

<  *><.*>»’+ 1  >  is  an  edge  in  the  edge  set  E  and  edge  <  vn,vj  >  is  in  E.  A  graph 
that  has  a  hamiltonian  circuit  is  called  hamiltonian.  If  a  graph  G  is  hamiltonian. 
we  see  from  the  definition  of  hamiltonian  circuit  that  there  is  a  way  in  G  to  traverse 
a  sequence  of  adjacent  edges  in  such  a  way  that  every  vertex  in  G  is  visited  once 
and  only  once.  The  problem  of  finding  a  hamiltonian  circuit  in  a  graph  (if  one 
exists)  is  a  very  important  NP-complete  problem  that  has  received  much  study.  In 
(1)  an  efficient  algorithm  for  the  Hamiltonian  Circuit  problem  was  introduced  and  it 
was  shown  that  this  algorithm  almost  always  finds  a  hamiltonian  circuit  in  random 
graphs  that  are  hamiltonian.  This  result  has  been  regarded  as  evidence  that  the 
Hamiltonian  Circuit  problem  is  tractable  in  some  probabilistic  sense. 

The  result  of  [1]  is  based  on  the  following  graph  distribution  (for  undirected 
graphs):  a  random  undirected  graph  has  n  vertices  and  each  pair  of  vertices  is  con¬ 
nected  by  an  edge  with  probability  p  independently  of  any  other  edge  connections. 
This  distribution  is  analogous  to  the  random  clause  distribution  for  SAT.  According 
to  a  result  of  [1]  there  is  an  efficient  algorithm  which  finds  a  hamiltonian  circuit 
in  a  random  undirected  graph  in  probability  if  p  >  (1  4-  t)ln(n)/n  where  t  is  a 
small  constant.  It  had  already  been  known  that  no  hamiltonian  circuit  exists  in  a 
random  graph  with  probability  tending  to  1  if  p  <  In(n)/n.  Thus,  the  algorithm 
of  [1]  finds  a  hamiltonian  circuit  in  nearly  all  graphs  that  have  one.  Compare  this 
result  with  our  result  that  solutions  to  instances  of  SAT  generated  according  to  the 
random  clause  model  can  be  found  efficiently  in  probability  when  p  >  In (n)/r  (p, 
r  and  n  are  the  parameters  of  the  random  clause  model  -  not  the  random  graph 
model),  random  instances  of  SAT  have  no  solution  with  probability  tending  to  1 
if  p  <  ln(n)/(2r),  and  if  p  =  cln(n)/r,  .5  <  c  <  1,  solutions  to  random  instances 
of  SAT  are  found  efficiently  in  probability  if  lim„,r_oo  nl~e/r  —*  0  and  no  solution 
exists  if  limn,r_oo  n,_e/r  — ►  oo. 
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As  previously  mentioned  we  have  found  that  algorithms  which  work  well  prob¬ 
abilistically  under  the  random  clause  model  do  not  necessarily  work  well  under  the 
fixed  clause  length  model.  The  reason  appears  to  be  that  the  random  clause  mode! 
generates  lots  of  “easy”  instances.  We  illustrate  this  as  follows:  Let  I(n,r,p )  (or 
I  when  the  parameters  are  obvious)  refer  to  a  random  instance  of  SAT  generated 
according  to  the  random  clause  model  with  parameters  n,  r  and  p.  If  p  is  pro¬ 
portional  to  1/r  the  average  number  of  literals  per  clause  is  constant.  But,  with 
probability  tending  to  1  there  is  a  zero  literal  clause  in  l(n,r,0{\/r)).  Zero  literal 
clauses  cause  /  to  be  unsatisfiable.  As  p  is  increased  to,  say,  0(ln  ln(n)/r)  the  aver¬ 
age  number  of  literals  per  clause  is  now  tending  to  infinity  as  r  tends  to  infinity.  A 
huge  number  of  literals  per  clause  makes  the  job  of  finding  a  truth  assignment  that 
satisfies  lots  of  clauses  much  easier  since  more  opportunites  for  doing  so  arise.  But, 
as  those  opportunities  increase  /  remains  unsatisfiable  because  there  is  still  a  zero 
literal  clause  in  I  with  probability  tending  to  1.  If  we  increase  p  to  a  value  close 
to  ln(n)/(2r)  the  opportunities  for  developing  a  truth  assignment  satisfying  lots  of 
clauses  become  overwhelming  but  with  probability  tending  to  1  there  exists  a  zero 
literal  clause  which  keeps  /  unsatisfiable.  Finally,  if  p  is  increased  beyond  the  value 
which  forces  a  zero  literal  clause  to  be  in  /  with  probability  tending  to  1,  there  is 
nothing  to  prevent  /  from  being  satisfiable  and  the  huge  average  number  of  literals 
per  clause  allows  finding  a  satisfying  truth  assignment  easily. 

This  phenomenon  does  not  occur  with  the  fixed  clause  length  model  for  SAT. 
The  reason  is  that  there  is  no  possibility  of  zero  literal  clauses  so  instances  are 
satisfiable  even  when  the  number  of  literals  per  clause  is  constant.  But,  if  the 
number  of  literals  per  clause  is  constant  there  are  relatively  few  opportunities  for 
developing  a  truth  assignment  which  satisfies  all  clauses;  thus,  it  is  hard  to  do  so 
(instances  are  “hard”).  The  analog  of  the  fixed  clause  length  model  for  SAT  is  the 
k-regular  graph  model  for  (undirected)  graphs.  (A  k-regular  graph  is  an  undirected 
graph  such  that  every  vertex  has  degree  k.  The  k-regular  graph  model  assigns  equal 
probability  to  every  n  vertex  k-regular  graph.) 


lYom  the  above  discussion  it  seems  that  the  satisfiable  instances  generated  un¬ 
der  the  random  clause  model  are  easier  to  solve  than  many  satisfiable  instances 
generated  under  the  fixed  clause  length  model.  The  result  is  that  some  algorithms 
which  work  well  probabilistically  under  the  random  clause  model  do  not  work  well 
under  the  fixed  clause  length  model.  We  have  verified  this  for  some  algorithms  for 
SAT.  Since  the  fixed  clause  length  and  random  clause  models  for  SAT  are  analo¬ 
gous  to  the  k-regular  and  random  graph  models,  respectively,  we  decided  to  check 
whether  the  sensitivity  of  probabilistic  performance  to  instance  distribution  ob¬ 
served  for  SAT  holds  for  the  Hamiltonian  Circuit  problem.  Sc  far  our  results  are 
as  expected:  the  algorithm  of  [1]  performs  poorly  on  instances  of  k-regular  graphs 
which  have  hamiltonian  circuits.  In  fact,  the  algorithm  of  [lj  almost  never  succeeds 
in  finding  a  hamiltonian  circuit  when  one  exists.  Thus,  the  question  of  whether 
the  Hamiltonian  Circuit  problem  is  tractable  in  some  probabilistic  sense  must  be 
reopened  and  reexamined.  We  have  begun  to  do  this  by  experimenting  with  other 
algorithms  for  the  Hamiltonian  Circuit  problem.  One  of  these  performs  much  better 
than  the  algorithm  of  [lj  but  does  not  find  hamiltonian  circuits  often  enough  to  get 
excited  over.  A  complete  report  on  this  matter  will  be  prepared  after  more  work 
can  be  accomplished. 


6.  Summary  of  New  Results 


We  have  obtained  the  following  results  during  the  past  year: 

1.  Algorithm  A i  efficiently  solves  SAT  (does  not  “give  up”)  in  probability  under 
the  random  clause  model  when  p  =  cin(n)/r,  .5  <  c  <  1,  and  limn.r_oo  nl~e /r  = 
0  [  16) .  In  [  1 6)  it  is  also  shown  that  instances  generated  according  to  the  random 
clause  model  have  no  solution  with  probability  tending  to  !  when  p  =  cin(n)/r. 
.5  <  c  <  I,  and  limn,r_oo  n1  ~r/r  =  oo. 

These  results  are  significant  for  two  reasons.  First,  they  say  that  A\  almost 
always  finds  a  solution  to  a  random  instance  of  SAT  (generated  under  the  ran¬ 
dom  clause  model)  when  one  exists.  Second,  they  demonstrate  the  power  of 
the  following  line  in  At: 

If  there  is  a  single-literal  clause  (u)  €  /  then  v  —  u 

We  have  shown  that  A\  performs  very  poorly  (almost  always  gives  up)  without 
this  line  when  ln(n)/(2r)  <  p  <  ln(n)/r. 

2.  From  our  analysis  in  [5]  and  [6]  we  have  devised  a  new  algorithm  for  SAT 
which  we  called  A3  in  section  3.  According  to  our  experiments,  A3  solves  SAT 
efficiently  in  bounded  probability  under  the  fixed  length  clause  model  when 
limn,r_oo  n/r  <  4. 

The  significance  of  this  result  is  that  A3  appears  to  efficiently  solve  almost  ali 
instances  of  3-SAT  that  are  satisfiable. 

3.  We  have  shown  that  any  algorithm  for  verifying  unsatisfiability  of  the  kind 
represented  by  A4  requires  exponential  time  with  probability  tending  to  I  under 
the  fixed  clause  length  model  if  the  ratio  of  n  to  r  is  fixed. 

This  result  is  significant  because  it  applies  to  a  wide  class  of  algorithms 


4.  We  have  found  that  the  algorithm  of  (ij  for  finding  Hamiltonian  Circuit*  in 
random  graphs  performs  poorly  on  hamiltonian  k-reguiar  graphs.  We  nave 
also  found  that  another  algorithm  for  obtaining  hamiltonian  circuits  performs 
much  better  than  the  algorithm  of  [lj  on  k-regular  graphs  but  not  well  enough 
to  find  hamiltonian  circuits  even  most  of  the  time. 

FYom  these  results  we  see  that  an  NP-complete  problem  that  had  been  regarded 
as  tractable  in  the  probabilistic  sense  may  not  be  tractable  after  all.  If  not,  the 
reason  may  be  that  the  distribution  chosen  in  (lj  for  analysis  is  faulty  in  that 
it  allows  too  many  “easy’’  instances  to  be  generated.  These  results  point  out 
the  need  for  further  research  aimed  at  being  able  to  distinguish  distributions 
that  generate  many  “easy”  instances  from  those  that  generate  mostly  “hard” 
instances. 
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